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Abstract We describe regularized methods for image reconstruction and focus on the 
question of hyperparameter and instrument parameter estimation, i.e. unsupervised and my- 
opic problems. We developed a Bayesian framework that is based on the posterior density 
for all unknown quantities, given the observations. This density is explored by a Markov 
Chain Monte-Carlo sampling technique based on a Gibbs loop and including a Metropolis- 
Hastings step. The numerical evaluation relies on the SPIRE instrument of the Herschel 
observatory. Using simulated and real observations, we show that the hyperparameters and 
instrument parameters are correctly estimated, which opens up many perspectives for imag- 
ing in astrophysics. 

Key words. Techniques: inverse problem, Bayesian regularization, hyperparameter esti- 
mation, instrument parameter estimation, semi-blind, myopic, autocalibration, image pro- 
cessing, deconvolution, super-resolution. 

1. Unsupervised myopic inversion 

The agreement of physical models and observations is a crucial question in astrophysics, how- 
ever, observation instruments inevitably have defects and limitations (limited pass-band, non- 
zero response time, attenuation, error and uncertainty, etc.). Their inversion by numerical pro- 
cessing must, as far as possible, be based on an instrument model that includes a description of 
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these defects and limitations. The difficulties of such inverse problems, and notably their often 
ill-posedness, were well identified several decades ago in various communities: signal and image 
processing and statistics, and also mathematical physics and astrophysics. It seems pertinent to 
take advantage of the knowledge amassed by these communities concerning both the analysis of 
the problems and their solutions. 

The ill-posedness comes from a deficit of available information (and not only from a "simple 
numerical problem"), which becomes all the more marked as resolution requirements increase. 
The inversion methods must therefore take other information into account to compensate for the 
deficits in the observations: this is known as regularization. Each reconstruction method is thus 
specialised for a certain class of objects (point sources, diffuse emission, superposition of the two, 
etc.) according to the information accounted for. Consequently, in as much as it relies on various 
sources of information, each method is based on a trade-off, which usually requires the setting of 
hyperparameters, denoted by £ in the following. The question of their automatic tuning, namely 
unsupervised inversion, has been extensively studied and numerous attempts investigate statis- 
tical approaches: approximated, pseudo or marginal likelihood, in a Bayesian or non-Bayesian 
sense, EM, SEM and SAEM algorithms, etc. The reader may consult papers such as (Zhou et al. 
1997; de Figueiredo & Leitao 1997; Saquib et al. 1998; Descombes et al. 1999; Molina et al. 
1999; Lanterman et al. 2000; Pascazio & Ferraiuolo 2003; Blanc et al. 2003; Chantas et al. 2007; 
Giovannelli 2008; Babacan et al. 2010; Orieux et al. 2010a) and reference books such as (Winkler 
2003, Part. VI), (Li 2001, Ch.7) or (Idier 2008, Ch.8). Alternative methods are based on the L- 
curve (Hansen 1992; Wiegelmann & Inhester 2003) or on generalised cross-validation (Golub 
et al. 1979; Fortier et al. 1993; Ocvirk et al. 2006). 

The construction of maps of high resolution and accuracy relies on increasingly complex in- 
struments. So, inversion methods require instrument models that faithfully reflect the physical re- 
ality to distinguish, in the observations, between what is caused by the instrument and what is due 
to the actual sky. Then, a second set of parameters comes into play: the instrument parameters, 
denoted by r] in the following, such as lobe width, amplitude of secondary lobes, response time, 
or gain. Their values are of prime importance and their settings are generally based on dedicated 
observation and rely on models and/or calibrations that inevitably contain errors. For example, 
the lobe widths are usually determined from a specific observation in a spectral band of non-zero 
width; consequently the result depends on the source spectrum. Correction factors can be applied 
but, naturally, they also contain errors when the source spectrum is poorly known or unknown. In 
contrast, our aim is to achieve myopic inversion, i.e. to estimate the instrument parameters with- 
out dedicated observation. The question arises in various fields: optical imaging (Pankajakshani 
et al. 2009), interferometry (Thiebaut 2008), satellite observation (Jalobeanu et al. 2002), mag- 
netic resonance force microscopy (Dobigeon et al. 2009), fluorescence microscopy (Zhang et al. 
2007), deconvolution (Orieux et al. 2010b), etc. A similar problem deals with non-parametric 
intrument response (blind inversion), for which the literature is also very abundant: (Mugnier 
et al. 2004; Thiebaut & Conan 1995; Fusco et al. 1999; Conan et al. 1998) in astronomy and 



F. Orieux et al.: Unsupervised and myopic inverse problems. 3 

(Lam & Goodman 2000; Likas & Galatsanos 2004; Molina et al. 2006; Bishop et al. 2008; Xu 
& Lam 2009) in the signal -image literature represent examples. The present paper is devoted to 
parameter estimation for the instrument model developed in our previous paper (Orieux et al. 
2012b), based on an accurate instrument model. 

A threefold problem has to be solved: from a unique observation, estimate the hyperparam- 
eters the instrument parameters and the map. This is referred to as unsupervised and myopic 
inversion. From the methodological point of view, the proposed inversion method comes within 
a Bayesian approach (Idier 2008). In this family, we find the classic Wiener and Kalman meth- 
ods that calculate the expectation or the maximizer of a posterior density. In an equivalent way, 
the Phillips-Twomey-Tikhonov methods calculate the minimizer of a least-squares criterion with 
quadratic penalization. These methods are based on a second-order analysis (Gaussian models, 
quadratic criteria) and lead to linear processing. The work proposed here is in a similar method- 
ological vein as far as estimating the map goes; however, the contribution concerns the estimation 
of the hyperparameters and instrument parameters. We resort to an entirely Bayesian approach 
(also called full-Bay es) that models the information for each variable (observations, unknown 
map as well as hyperparameters and instrument parameters) through a probability density. Based 
on an a posteriori distribution for all the unknown variables, the proposed method jointly esti- 
mates the instrument parameters, the hyperparameters, and the map of interest. Regarding ex- 
perimental data processing, the present paper follows (Orieux et al. 2012b) on inversion for the 
SPIRE instrument onboard Herschel, which requires the hyperparameters to be fixed by hand 
and the instrument parameters to be known. The proposed method can automatically tune these 
parameters and may permit the systematic and automatic processing of large information streams 
coming from present and future space-based instruments (e.g. Herschel, Planck, JWST, etc.). 

The paper is structured as follows. Section 2 introduces the notation and sets out the problem. 
Section 3 presents the inversion method: it introduces the prior densities and leads to the posterior 
density. Section 4 describes the computing method based on Markov Chain Monte-Carlo (Gibbs) 
stochastic sampling algorithms. The work is essentially evaluated on simulated observations and 
on a first set of real observations in the context of the SPIRE instrument onboard Herschel. The 
results are presented in Section 5. Finally, some conclusions and perspectives are provided in 
Section 6. 

2. Notation, instrument, and map models 

To produce accurate and reliable maps, the inversion must exploit a description that represents 
the acquisition process as faithfully as possible. In this sense, the instrument model 

- is based on a map of the sky noted X, which is naturally a function of continuous spatial 
variables (a, j3) <G R 2 (and possibly a spectral variable A e R+), 

- and accurately describes the formation of a set of N discrete observations grouped together 
in a vector y G R N . 



4 



F. Orieux et al.: Unsupervised and myopic inverse problems. 



A general description of the map of the sky as a function of continuous spatial variables can 
be written starting from a basic function ip by combination and regular shifting: 



The function tp must be chosen so that this decomposition can describe the maps of interest and 
is easy to handle. It may be, among other choices, a pixel indicator, a cardinal sine function, or 
a wavelet (although in the last case the function ip and the coefficients also depend on a scaling 
parameter). Whatever the choice, the map of interest is finally represented by its coefficients Xij, 
the number of which is arbitrarily large and collected in cc € R M in what follows. In practice, we 
choose the Gaussian family as this greatly simplifies the (theoretical and numerical) calculations 
of the model outputs, including for complex models (Orieux et al. 2012b; Rodet et al. 2008). 

The presented work is quite generic in the sense that it is not a priori attached to a specific 
instrument. It deals with a general linear instrument model that describes, at least to a fair ap- 
proximation, the physics of the processes in play: optics, electrics, and thermodynamics. It also 
includes the passage from a continuous physical reality to a finite number of discrete observa- 
tions. The instrument is then described by 



i.e. a general linear model w.r.t. x (a special case of which is the convolutive model). This model 
shows the instrument parameters rj e R K that define the form of the instrument response. The 
component n = y A v x represents the measuring and modelling errors additively. For the 
SPIRE instrument (Griffin et al. 2010) of the Herschel space observatory (Pilbratt et al. 2010) 
launched in May 2009, the paper of (Orieux et al. 2012b) gives the details of the instrument 
model construction. The results of Section 5 are based on this instrument. 

3. Probabilistic models and inversion 

The proposed inversion is developed in the framework of Bayesian statistics. It relies on the 
posterior density p(x,^,rj\y) for the unknown quantities x (image), £ (hyperparameters), and 
r] (instrument parameters) given y (observations). This density brings together the information 
about the unknowns in the sense that it attaches more or less confidence to each value of the 
triplet (x, £, rj). A summary of this density in the form of a mean and a standard deviation will 
provide (1) a point estimate (the posterior mean) for the map of interest and the parameters, and 
(2) an indication of the associated uncertainty (the posterior standard deviation). 

Remark 1. In statistical terms (Robert 2005), the posterior mean is an optimal estimator. More 
precisely, of all the possible estimators (whether Bayesian or not, empirical or not, a computation 
code, etc.), the posterior mean yields the minimum mean square error (MMSE) 1 . Regarding first- 
order statistics, this estimator has, moreover, a zero mean bias. 

1 The mean square error is the expected value of the squared norm of the difference between estimated 
value and true value. The expectation is under the distribution of the observation and the unknown. The MSE 




(1) 



y = A v x + n, 



(2) 
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The posterior density is deduced as the ratio of the joint density for all considered quantities 
p(x, £, rj, y) and the marginal density for the observations p(y) by application of Bayes' rule 

p(y) 

Seen as a function of the unknowns (x, £, rj), this posterior density is proportional to the joint 
density: 

p(x, £, n\v) oc p(x, £, t], y). (4) 

This joint density is essential as all the other densities (marginal, conditional, prior, posterior, 
etc.) can be deduced from it. It can be factorised in various forms and, in preparatio for the 
developments to follow, we write 

p{x, $, rj, y) = p(y\x, £, r])p(x, £, rj) 

= P(y\x, £, rj)p{x\£, rj)p(£, r)) 

= p{y\x,£,r])p(x\£)p(€)p(ri), (5) 

including the fact that (1) the hyperparameters £ and the instrument parameters rj are a priori 
independent and (2) the object x and the instrument parameters rj are also a priori independent. 

The different probability densities will be defined in the following sections according to the 
information available on each set of variables and according to practical concerns about dealing 
with the probability densities and numerical computation time. 

3. 1. Modelling of errors and likelihood 

The factor p(y\x,£,rj) in Eq. (5) is the density for the observations y given the map x, the 
instrument parameters rj, and the hyperparameters £, i.e. the likelihood of the unknowns attached 
to the observations. 

Given Eq. (2), the construction of this likelihood is based on the model for the error n. The 
analysis developed in this paper is essentially founded on its mean m n and its covariance matrix 
£„, and the proposed model is Gaussian: 

n~jV(m n ,S n ). (6) 

The choice of the Gaussian model is also justified via information property: based on the sole 
information of finite mean and covariance, the Gaussian density is the model that introduces the 
least information (Kass & Wasserman 1996). This property is also mentioned as a maximum 
entropy property of the Gaussian density. 

m n is a scalar that models a possible non-zero mean for the noise, such as an offset (in 
the proposed numerical evaluation of Section 5, one offset for each bolometer is introduced). 
Regarding the covariance matrix, to lighten the notation, we set S^ 1 = j n Il n : j n is a scale 

is the sum of the variance and the squared bias of the estimator (under the distribution of the observation 
and the unknown). 
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factor (called precision, homogeneous to an inverse variance) and U n contains the structure 
itself. For a stationary model II^ 1 has a Tceplitz structure, for an auto-regressive model TI n is a 
band matrix, for a white model U n is diagonal, for a white and stationary model Yl n = I, the 
identity matrix. In the developments below, the structure of U n is given while the scale factor 
7„ and the mean m n are unknown and included in the vector £. The results in Section 5 are 
presented for the case U n = I; hence j n is the inverse of the noise power. 

Remark 2. The proposed developments account for characteristics of the error n that may differ 
from channel to channel, sensor to sensor, etc. This will be the case in Section 5: a mean and 
power of the noise will be assigned and estimated for each bolometer. 

As the error n is Gaussian and additive (Eqs. (6) and (2)), the vector of observations y, given 
x, rj, is also Gaussian 

y\x,£,r) ~ 7V(m w |*,S„|*) 

with mean 

rriy^ = A v x + m n (7) 

and with the same covariance as n: = £„. So, the likelihood of the unknowns attached to 
the observations reads 



p(y\ X ,^r,) = (2n)- N ^ 2 det [H 

1 



1/2 

r 1 i 

(8) 



exp 



2 J n (y - m y \ t ) Tl n (y - m y \*) 



It includes the information provided by the observations as the transform of a map x by the 
instrument, taking its parameters rj and the noise parameters j n and m n into consideration. 

3.2. Prior density for the map and spatial regularity 

The aim of this section is to introduce a prior density p(x\£) for the unknown map coefficients 
x based upon available information about the map X. The present work is mainly devoted to 
extended emissions. From a spatial standpoint, such maps are relatively regular, i.e. they involve 
positive correlation. From the spectral standpoint, the power is mainly located at relatively low 
frequencies. The Gaussian density includes these second-order properties in a simple way. This 
choice can also be justified based on a maximum entropy principle. Its main interest here is to 
result in a linear processing method. It is written in the form 



p(x\ 7x ) = (2n)- M / 2 ^det [U x f 2 

1 



cxp 



2^fx x n x x 



(9) 



where -f x is a precision parameter (homogeneous to an inverse-variance that controls the reg- 
ularity strength) and JJ X is a precision matrix (homogeneous to an inverse-covariance matrix 
that controls the regularity structure). When the precision j x is low (strong prior variance), the 
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regularity information is weakly taken into account. Conversely, when the precision j x is high 
(weak prior variance), the penalization of non-regular maps is high, i.e. the regularity is strongly 
imposed. 

The subsequent developments are devoted to the design of Yi x to account for the desired 
regularity of the map. A simple regularity measure R c [X] of the map X is the energy of some 
of its derivatives. These derivatives can address the spatial variables (a, (3) separately, can rely 
on cross derivatives and can intervene at various orders. This is the classical Philipps-Twomey- 
Thikonov penalization idea (Tikhonov & Arsenin 1977). It can also embed directional derivatives 
or any differential operator (Mallat 2008). In the simplest and natural case, we choose 



R c [X] = 



dX 


2 


dX 


da 


+ 


d(3 



where ||u|| is the standard function norm 2 . Given the decomposition (1), it is easy to establish 
the partial derivatives of X from the derivatives of tp. In the direction a, by noting tp' a = dtp /da, 
we have 



tp' a [a - i S a ,f3- j Spj dad/3, 

which brings out the autocorrelation \I> a = tp' a * tp' a of the derivative of tp. We then have a 
quadratic form in x 



As the coefficients $> a — i) 5 a , (j' — j) 5p] depend only on the difference between indices, 
the matrix has a Tceplitz structure and the computations amounts to a discrete convolu- 
tion that can be efficiently implemented by the use of fast Fourier transform (FFT). Finally, 
by performing the same development in the (3 dimension, a global quadratic norm appears: 
R c [X] — ^(vE^ + *&p)x and designs the precision matrix Ila. = + ^Sfp. For more de- 
tails and for a spectral interpretation, see Section 2.1 and Appendix A of (Orieux et al. 2012b). 

3.3. Prior distribution for hyperparameters (hyperprior) 

The hyperparameters are the unknown parameters of the densities for the error and for map 
Eqs. (8)-(9) and they are collected in the vector £ = [m n , 7„, -f x ]. It has been said that -f x and 
7„ are the precisions (scale parameters) and m n is a mean (position parameter) of Gaussian 
densities. 

2 The function squared norm is defined by ||ii|| 2 = JJ u(ct, /3) 2 da d/3. 






[(»' - i) S a , (f - j) dp] = x** a x. 
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The choice of the prior distributions for these hyperparameters is driven by two requirements: 
(i) little information is available a priori on their values and their relations and (ii) the chosen 
distributions must lead to efficient algorithms (see section 4.2). Following this line of thought, 
we choose a prior distribution determined by Jeffreys' principle 3 : ^(7) = 1/7 for j x and j n 
and p(m n ) = 1. Moreover, regarding the triplet of hyperparameters £ = [m n ,'y x ,j n ], they 
are modelled as independent variables, since no information is available about their eventual 
relations. Finally 

PK^x^n) = 1/7*7™ (10) 

has two advantages 

1. First of all, the posterior conditional densities for 7 X and j n (resp. for m n ), as shown in 
section 4.2, will be gamma densities (resp. Gaussian density), which will make the imple- 
mentation easier. 

2. This prior distribution is non-informative (which introduces a minimum of information on 
the value of the hyperparameters) in the sense that it is invariant by certain parameterization 
changes (Robert 2005; Kass & Wasserman 1996). 

3.4. Prior density for the instrument parameters 

The instrument parameter rj operates in a complex nonlinear way in the description of the obser- 
vations. In consequence, whatever the prior density, the conditional posterior density for 77 (see 
section 4.3) will not have a standard form. The choice is thus purely oriented by the information 
on the instruments and the question that arises concerns the encoding of the available information 
in the form of a probability density. If we have no information except a minimum and a maximum 
value for a given parameter, the choice of a uniform density over the interval is a reasonable one. 
If we have a nominal value with an associated uncertainty and no other information, the most 
suitable choice is a Gaussian density. The rest of the development is valid whatever the choice, 
and we consider the Gaussian case in the following. 

In addition, having no information available about possible links among the various parame- 
ters, we take it that the parameters are, a priori, independent and thus 

K 

p(rj) = [](27r i0fe )- 1 / 2 cxp 

k=i 

In practice and for the first results presented in Section 5, the means [i k and variances p k were 
taken from the SPIRE observer manual or were fixed ad-hoc at plausible values. 



(Vk - MfcT 

2p fe 



(11) 



3 It yields a non-informative prior distribution based on a key feature that it is invariant under reparame- 
terization. It is deduced as the determinant of the Fisher information matrix. 
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Figure 1. Graphical dependency representation (hierarchical structure). The round (square) 
nodes correspond to unknown (fixed) quantities. The directions of the arrows indicate the de- 
pendencies. 



3.5. Posterior density: histograms, mean and standard deviation 

The posterior density (3) for all the unknowns x, rj, is deduced from the joint density (5) for 
all quantities concerned as follows: 



P(x, £, r)\y) oc p{y\x, £, v)p{x\€)p(€)p(v)- 



In this expression, 



- the density p(x\-f x ) for the unknown map x e R M is Gaussian (Eq. (9)); 

- the distribution (for £ = [m n ,-f n , -f x ]) is a Jeffreys' distribution (Eq. (10)); 

- the instrument parameters r] G H K is modelled by a Gaussian density p(rj) (Eq. (1 1)); 

- the density p(y\x, £, rj) for the observations y e R w given the rest of the variables (i.e. 
likelihood) is a Gaussian density (Eq. (8)) and it is a function of x through m„|, given 
byEq. (7). 

Finally, from equations (8), (9), (10), and (11), the posterior density can be written 



exp 



K 

k=l 

1 (r]k - Pk) 2 
2 



Pk 



exp 



exp 



2 TaJ *^ ElajX 



:ln(y - m 3/ | st )*n„(y - m y ^) 



(12) 



This density brings together all information about the unknowns, and the estimators and al- 
gorithms presented below are entirely based on it. However, it is too complex to be analyzed 
directly as a whole and the difficulty stems from (i) the dimension of x (of size M ~ 10 5 in 
practice) and (ii) the joint presence of other parameters (hyperparameters £ and instrument pa- 
rameters 77). Moreover, for the latter in particular, the dependence is complicated and cannot be 
identified with a standard form. The proposed approach is to explore the posterior density by 
means of stochastic sampling (Robert & Casella 2004; Gilks et al. 1996). The idea is to produce 
a set of samples x^ q \ r/ 9 ', for q = 1,2, ... ,Q, drawn at random under the posterior den- 
sity. It is then possible, for example, to deduce histograms that approximate marginal densities, 
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Table 1. Gibbs algorithm 

Initialize a^U^r/ ) 
for q = 1, 2, . . . do 

(1) sample x iq) under p{x\i {q ~ 1) , r] (q ' 1) , y) 

(2) sample £ (q) under p(£\x M , T7 (9_1) , y) 

(3) sample T7 (9) under p(r]\x (q) , £ (q) , y) 
end for 

together with means and standard deviations. This strategy is by no means new but interest in its 
practical use has revived in recent years as new forms of algorithms have been developed and 
computer power has increased. 

Concerning the estimates themselves for the map, the hyperparameters and the instrument 
parameters, we choose the posterior mean (PM), as indicated at the beginning of Section 3 (see 
also Remark 1). We will also look at the dispersion around the mean through the posterior stan- 
dard deviation (PSD) and the links among components through posterior correlations. Using the 
set of samples x^ q \^ q \r)^ g \ for q = 1,2, ... ,Q, the posterior mean fi P and the posterior 
covariance matrix T P are computed by 



where x denotes the column concatenation x = [x;£;r]]. Practically, it is not possible to compute 
the entire covariance Tp, but it is possible to compute its diagonal elements to characterize the 
marginal errors for each component (each pixel, hyperparameters, instrument parameters) and to 
compute a few nondiagonal elements to measure the correlations between components. 

4. Exploration of posterior density and the computation algorithm 

We have introduced an instrument model and various probability densities to define the posterior 
density that brings together the information on the map and the parameters (hyperparameters and 
instrument parameters). We have also defined the posterior mean (PM) as an estimate and the 
posterior standard deviation (PSD) as a measure of the uncertainty. We have then introduced the 
idea of computations via stochastic sampling. The developments in the present section concern 
the algorithm for computing these samples. 

The production of samples of the posterior density for the set (x, £, rj) is not possible directly 
because of the complexity of the density. We therefore use a Gibbs algorithm (Robert & Casella 
2004; Gilks et al. 1996), which breaks the problem down into three simpler subproblems: sam- 
pling x, and rj separately. This is an iterative algorithm, described in Tab. 1 : each variable x, 
£, and ij is drawn under its conditional posterior density given the current value of the other two 




(13) 




(14) 
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variables. For each of the three steps, this conditional posterior density can be deduced directly 
(up to a multiplying factor) from the posterior density (12): all we have to do is to keep only 
the factors depending on the variable of interest. This algorithm is a Markov chain Monte-Carlo 
(MCMC) algorithm (Robert & Casella 2004; Gilks et al. 1996) and is known to give (after a 
certain time, called the burn-in time) samples under the posterior density. 

The conditional density of the map coefficients x (Step (1), Tab. 1) is Gaussian (Section 4.1). 
For the precisions £ (Step (2), Tab. 1), the conditional densities are gamma densities (Section 4.2). 
They will be sampled using standard existing numerical routines (e.g. in Matlab). In contrast, the 
conditional density of the instrument parameters r\ (Step (3), Tab. 1) has a much more complex 
nonstandard form, so that it cannot be directly sampled by existing routines. To overcome this 
difficulty, sampling was carried out by means of a Metropolis-Hastings step (Section 4.3). 



4. 1. Map sampling 

The density for the map x conditionally on the other variables is deduced from (12) by extracting 
the factors depending on x: 



p(x\y^x,ln,v) oc exp 



(15) 



Considering the expression for m s |, given by Eq. (7), the argument of this exponential is 
quadratic in x. We deduce that we have a Gaussian density and, by rearranging the argument, we 
can determine the covariance and the mean 

= (7„Aj ) n„A 77 + lx n x y 1 (16) 

^a-i* = 7 n E x |*Aj J II n y, (17) 

Remark 3. For a fixed value of the hyperparameters and the instrument parameters, the map 
mj,|, defined by Eqs. (16)-(17) is the maximizer (and the mean) of the conditional poste- 
rior density (15). This is the regularized least-squares solution denoted x([i) parameterized by 
\i = jx/lrf This corresponds to the solution defined in our previous paper (Orieux et al. 2012b). 
For a convolutive instrument model, it is the Wiener solution (also called Wiener-Hunt solu- 
tion) (Orieux et al. 2010b). 

Step (1) of Tab. 1 consists of sampling this Gaussian but this operation is made very difficult 
by three elements: (i) the large size of the map, (ii) the correlation introduced by the instrument 
model and the prior density, and (iii) the absence of structure of the instrument model (invari- 
ance, sparse nature). This problem can be solved using several approaches. For a convolutive 
instrument model (2), it is possible to approximately diagonalise the correlation matrix by FFT, 
thus producing a sample for the cost of an FFT (Chellappa & Chatterjee 1985; Chellappa & 
Jain 1992; Geman & Yang 1995; Giovannelli 2008; Orieux et al. 2010b). If where the inverse 
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of the correlation matrix is sparse, a partially parallel Gibbs sampler may be particularly effi- 
cient (Winkler 2003, Chap. 8). In the present case, neither the correlation nor its inverse possess 
the required properties. A general solution relies on factorizing the correlation matrix (Cholesky 
decomposition, diagonalization, etc.) but the large size of the matrix (M x M with M <~ 10 5 ) 
does not permit the required calculations to be performed here. 

The proposed solution consists of constructing a criterion such that its minimizer is a sample 
under the desired posterior conditional density. To do this, we perturb the means of the noise 
component and of the map component by an additive component with covariance 7 X II X and 
7 Ti n„. A perturbed regularized least squares criterion is then introduced 



is Gaussian and does indeed have the correlation and mean defined by (16) and (17). This very 
powerful result has already been used by (Feron 2006; Orieux et al. 2012b). In different forms, 
similar ideas have been introduced and used by (Rue 2001; Rue & Held 2005; Lalanne et al. 
2001; Tan et al. 2010). 

Remark 4. For the non perturbed criterion (y = y and rh x — 0), we have the regularized least- 
squares solution Eqs. (16)-(17), that was mentioned in Remark 3. 

Remark 5. The approach described in (Orieux et al. 2012a) involves the sampling of the prior 
density (9) that is not properly defined here: the matrix Yi x does not penalise the mean of the 
map (it is of deficient rank). But, for the same reason, the solution (18) does not depend on the 
mean of the realization of the prior density. Therefore, the simulated sample can have an arbitrary 
mean value. 

4.2. Hyperparameter sampling 

To determine the posterior conditional density for j x , we examine the posterior density (12), and 
only keep the factors where j x appears, which gives 



J(x) = 7n (y - A v xf U n (y - A n x) + 



7 X (x - rrixf U x (x - m x ) 



and it can be shown (see (Orieux et al. 2012a)) that its minimizer 



x = (^A^UnAr, + j x U x ) 



-l 



(7 n ^II n y + -y x Il x m x ) (18) 



P(j x \y,x 7 j n7 q) oc 7. 



,M/2-l 



x I I 

CXp [-y||aj 



and we recognize a Gamma density (see Appendix A) 



lx ~Q(M/2, 2/11*11^). 



(19) 



Concerning j n , we also refer to the posterior density (12) and find 
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Table 2. Step of the Metropolis-Hastings sampler, which replaces Step (3) of Tab. 1. The current 
sample at step q is r]^ and it is either replaced or not by the proposed sample r] p . 

(a) Draw a sample rf under a proposal density 

(b) Compute the acceptation ratio p by Eq. (23) 

(c) Replace r/ q) by rf (i.e. ?7 (9+1) = rf) with the probability min(l,p), otherwise keep r/ 9 - 1 (i.e. 
J7 (9+1) = t|<«>). 

which is also a Gamma density 

7n~G(N/2, (20) 

For both "f x and 7 n the second parameter of the Gamma density introduces a quadratic norm 
(regularity of the map in Eq. (19), and goodness-of-fit in Eq. (21)), which can be easily computed. 

Remark 6. An intuitive interpretation can be given to these results starting from the fact that 
the mean of the Gamma density is equal to the product of its parameters (see Appendix A), 

II 1 1 2 

here N/ ||y — m s |, || n for (21). In this sense, the conditional posterior mean is the inverse of 
the empirical variance of the residuals. Consequently, when the goodness-of-fit term is small, the 
mean of the density is large and so the sampled value of j n is also high reporting a high precision, 
i.e. a weak variance (and vice versa). THe same holds for the map regularity in relation with the 
mean of the density (19) given by Mj \x\^^. These observations support the coherence of the 
model and reinforce the prior choice for these hyperparameters as a convenient one. 

Regarding the mean of the noise, m n , it is a scalar whose posterior conditional density is 
also deduced from the posterior density (12) and from (7) 

p{m n \y,x,i n ,i x ,r]) oc exp 

where m r is the empirical mean of the residuals y — A v x. We then have a Gaussian density 

m n ~N{m r ,i n ). (21) 

In the numerical evaluation of Section 5, one such mean is estimated for each bolometer. 

Remark7. If we examine the relationships above, we see that p(-f x \y,x,-f n ,ri) = p{^ x \x), 
in other words, -f x and (y,7n,^) are independent conditionally on x. Similarly, we note that 
p(jn\y, x, j x , T]) = p(jn\y, x, rt), which means that -f n and j x are independent conditionally 
on (y, x, rf). In addition, m n is independant of j x , given y, x, j n , rj. 



1 



ln{m n - m r ) 2 



4.3. Instrument parameter sampling 

The last step (Step (3) of Tab. 1) is more complex. As for the other variables, the posterior 
conditional density can be deduced from the posterior density (12) by keeping the factors that 



14 F. Orieux et al.: Unsupervised and myopic inverse problems. 

bring in rj. There are two of these: the likelihood and the prior density, and we thus have 

1 (Vk - Pk) 2 



P{ri\y,x,£) oc Yl cxp 



fe=i 



2 Pk 



exp 



-\ln{y - A v xYU n (y - A v x) 



(22) 



However, this is not a usual density, notably because there is no simple mathematical form to 
represent the dependence of the observation w.r.t. r]. Thus, Step (3) of Tab. 1 cannot be carried 
out directly with standard sampling routines and we resort to a Metropolis -Hastings step in a 
random-walk version (Robert & Casella 2004; Gilks et al. 1996) described in Tab. 2. It can be 
briefly explained as follows. Because it is impossible to draw a sample directly under the con- 
ditional posterior density (22), a sample is drawn under another density (namely the proposal 
density), but is not systematically accepted. Acceptance or rejection is also random with a pre- 
cisely defined probability (see Eq. (23)) to ensure that, at convergence, we have samples under 
the target density (Robert & Casella 2004; Gilks et al. 1996). The algorithm is divided into three 
sub-steps summarized in Tab. 2 and detailed here. 

(a) Draw a proposal rj p as a perturbation of the current value: r] p = rj^ + e, deduce the 
instrument matrix A^v, and the corresponding model output m p = A vP x^ + m n . 

(b) Compute the acceptation ratio 

= p(ri p \v,x,£) (23) 
p(r](i) \y, x, £)' 

based on the conditionnal posterior law ratio that compares the goodness-of-fit for the current 
parameter and the proposed one. 

(c) Accept or reject the proposal, at random, with probability min(l, p). To do so, draw u uni- 
formly in [0, 1] and take 



,,(9+1) 



t] p if u < min{l, p] 
j/ 9 ) otherwise. 



These three substeps are inserted instead of Step (3) of Tab. 1 . 

The algorithm can be explained as follows. Starting with a current value rj^ q \ the algorithm 
proposes a new value rj p and compares the goodness-of-fit for the two values. When the proposed 
value improves the fit, p > 1 and ?7 P is accepted. When the proposed value degrades the fit, r/ p 
can be accepted or rejected, with a probability that is higher or lower, depending on how weak 
the degradation is. 



Remark 8. There are other more complex (and potentially more efficient) approaches for 
Metropolis-Hastings sampling. In particular, the proposal density can be adapted, e.g. direc- 
tional random walk (Vacar et al. 2011)). They are not exploited here but are considered in the 
development perspectives. 
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5. Numerical results 
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The previous sections presented the approach for building the posterior density and for its explo- 
ration by stochastic sampling using a Gibbs algorithm including a Metropolis-Hastings step. The 
mean and standard deviation (SD) of the posterior density are numerically computed as empiri- 
cal averages based on simulated samples, from relations (13) and (14). The developments below 
show the practicability of the proposed method (models, estimate and algorithm), and provide a 
first numerical evaluation. 



5. 1. Evaluation methodology 

The evaluation is based on the SPIRE instrument (Griffin et al. 2010) of the Herschel space ob- 
servatory (Pilbratt et al. 2010) launched in May 2009. It focuses on the PMW channel (centred 
around 350 /im) and the Large Map protocol in the nominal operating conditions: scan back and 
forth with constant speed (30 "/sec) over two almost perpendicular directions (88°). The scans 
are associated with a high sampling frequency (F s « 30 Hz) providing spatially redundant obser- 
vation and Fig. 2 shows the corresponding redundancy/pointing map. The spatial shift between 
basis functions (see Eq. (1)) is fixed at S a = Sp = 2 ", based on our earlier work (Orieux et al. 
2009, 2012b), to obtain the best gain in resolution without important increase of the computa- 
tional cost. The angular size of the reconstructed map is 20 ' x 20 ', i.e. a map of 600 x 600 
coefficients. The associated direct model, including the whole acquisition chain (scanning strat- 
egy, mirror, horns, wavelength filters, bolometers, and electronics) is detailed in our previous 
paper (Orieux et al. 2012b) and represented by Eq. (2) of the present paper. 




Figure 2. Redundancy/pointing map associated with our experiment of five crossed scans. 



The unsupervised method is assessed based on two synthetic maps of extended emission (the 
Galactic Cirrus (Fig. 3(e)) and a realization of the prior density Eq. (9)) as well as based on a real 
observation (reflection nebula NGC7023, Fig. 7). The paper also proposes a first assessment of 
the unsupervised and myopic approach based on a synthetic map with broader spectral content 
(the Galactic Cirrus with point sources (Fig. 3(e))). 

In the simulated cases, a zero-mean white Gaussian noise is added to the model output. 
Moreover, in these cases, since the original map (the "sky truth" denoted by x*) is known, the 
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quality of the reconstruction (denoted by x) can be quantified through an error: 



%I 2 /El4l 2 



(24) 



where only coefficients in the observed area are taken into account, allowing assessments and 
comparisons between methods. 

5.2. Algorithm behaviour and general comments 

As explained in Section 4, the algorithm provides a series of samples that form a Markov chain 
for hyperparameters, instrument parameter and map. The MCMC theory then ensures that it cor- 
rectly explores the parameter space and produces a density of samples reflecting the posterior 
density. Practically, the algorithm has been executed for the unsupervised problem as well as 
for the unsupervised and myopic problem. It has been run several times (1) using identical ini- 
tial conditions and (2) using different initial conditions. In both cases, the same qualitative and 
quantitative behaviour as presented here has been systematically observed. 

The computation time takes about one hour for the unsupervised (nonmyopic) case and about 
ten hours for the unsupervised and myopic case. The main computational cost is due to compu- 
tatingq the instrument model output given by Eq. (2). 

Figs. 6, 7, and 8 present some typical elements of the algorithm operation: visualization of 
progression, convergence phase (burn-in period), stable phase, etc. The evolution of the chain 
is shown for hyperparameters (Figs. 6 and 7) and for instrument parameter (Fig. 8). It is thus 
possible to grasp how the parameter space is explored. 

5.3. Unsupervised approach 

5.3.1 . Assessment of map estimation 

The qualitative and quantitative assessment of the reconstructed maps is presented here for the 
Galactic Cirrus; the first results are shown in Fig. 3. 

- The unsupervised method (proposed method) is outlined in Fig. 3(a). The hyperparameters 
are automatically set (without knowing of the sky truth). 

- The best-supervised method is outlined in Fig. 3(b). The hyperparameters are set by hand 
to minimize the error £ (knowing the sky truth). It is referred to as the best map and was 
previously presented in our paper (Orieux et al. 2012b). 

- The naive map (coaddition) and the true map are shown in Figs. 3(d) and 3(e). 

- In addition, Fig. 3(c) gives spatial profiles (vertical in the middle of the map) and Fig. 3(f) 
gives the spectral 4 profiles. 

4 This spectrum is computed from the FFT-2D of the map by averaging the coefficients in regularly 
spaced concentric rings. This gives a ID spectrum containing the isotropic approximation of the spectral 
map properties. 
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As expected and shown in (Orieux et al. 2012b), the inversion based on an accurate instrument 
model considerably improves the quality of the map: see the proposed map and the best map 
compared to the naive map and to the true map. The proposed map is visually very similar to the 
true map. In particular, our method restores details of small spatial scales (with spectral extension 
from null to high frequency) that are invisible on the naive map but are present on the true map 
(see Fig. 3(c)). In addition, our method correctly restores the structures of large spatial scales 
(low frequencies) and also the mean level of the map (null frequency), i.e. the photometry. 




t; 





— Proposed 

— Best 
— True 















(a) Proposed map 



(b) Best map 



100 200 300 

(c) Profile 




(d) Naive map 



(e) True map 



f [1/arcsec] 
(f) Spectral density 



Figure 3. Comparison of reconstructed map for the Galactic Cirrus: proposed map (Fig. 3(a)), 
best map (Fig. 3(b)), naive map (Fig. 3(d)), and true map (Fig. 3(e)). Fig. 3(c) shows a profile 
(marked by the white line in Fig. 3(e)) and Fig. 3(f) the spectrum (circular means of power 
spectra). Uncertainties are given in Fig. 4 and quantitative results are given in Tab. 3. Comments 
are given in section 5.3.1. 



To assess the pertinence of estimating j n and j x in terms of map quality, we compared the 
proposed map with the best map. They are visually very similar (see Figs. 3(a) and 3(b)). In the 
quantitative terms given by Tab. 3, the best map produces an error £ of 3.12% and the proposed 
map produces an error £ of 3.47%, which is only slightly higher. In other words, the proposed 
unsupervised method automatically (without knowing the sky truth) determines hyperparameters 
that produce a map almost as good as the best map (which requires knowing the sky truth). 

However, the proposed map shows a fine grainy texture that is visible neither in the true 
nor in the best map. This feature is also visible on the residual map of the coefficiens (Fig. 5). 
This is also observable in Fig. 3(f): the spectrum of the proposed map passes above the spec- 
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±3<T 



10"' 

100 200 300 f[1/arcsec] 

(a) Map of PSD (b) True map and PM ± PSD (c) True spectrum and 

PM ± PSD 

Figure 4. PSD and quantification of uncertainties. Fig. 4(a) shows the map of the PSD and 
Figs. 4(b) and 4(c) show the interval around the estimated map ± PSD and the true map. In 
the spatial domain, Fig. 4(b) is a profile (marked by the white line on 3(e)) and in the spectral 
domain Fig. 4(c) is the circular means of the power spectra. 




x10~ 6 x10" 6 x10" 6 




(a) Proposed map residuals (b) Best map residuals (c) Naive map residuals 

Figure 5. Residuals of maps in the central part of measured coefficiens. This illustrates the grainy 
structure of the proposed map wrt. best map. The naive map residuals suffer twice more errors 
and a squared feature caused by the pixel model. 





Reconstruction error £ 


Unsupervised ("%,) 


0.016 % 


Best supervised (7^ est ) 


0.0129 % 


Naive map 


0.0435 % 



Table 3. Comparison of reconstruction error £ (see Eq. (24)) for the Galactic Cirrus (the error E 
only accounts for the observed area of the map). The proposed approach (which does not require 
knowing the true map) produces an error only very slightly higher than the best map (which does 
require knowing true map). 



trum of the true map in the spectral band 0.025-0.035 arcseconcP 1 . This defect is related to a 
slight overevaluation of the observation contribution with respect to the prior contribution. It is 
referred to as under-regularization and yields an overamplification of the observation in this spec- 
tral band. This confirms the behaviour previously observed in deconvolution (Orieux et al. 2010b; 
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Giovannelli 2008) or noted for the maximum likelihood (Fortier et al. 1993). Nevertheless, it is 
remarkable that this defect is correctly notified by the PSD, as explained in the next paragraph. 

Indeed, the approach naturally provides a measure of reliability through the PSD shown in 
Fig. 4. Two zones can be seen in Fig. 4(a), in accordance with Fig. 2: the central zone (where 
observations are available) and the peripheral zone (extrapolated from observations of the central 
zone and based on the prior regularity). The boundary between the two zones also exhibits the 
variation of the observation hit and scanning strategy notably well. In addition, the posterior 
standard deviation also illustrates the difference between the zones observed with our without 
cross scan. From a spatial standpoint, Fig. 4(b) shows an interval around the estimated map with 
plus/minus PSD. The main result is that the true map is clearly within the interval. In a similar 
way, from a spectral standpoint the results are given by Fig. 4(c) (in relation with Fig. 3(f)): 
the true spectrum is also within the interval. More specifically, incorrectly reconstructed in the 
spectral band (above 0.025 arcsecond -1 ), the stronger PSD clearly shows that the estimated 
spectrum is certainly submitted to unsatisfactory errors or confidence. 

5.3.2. Assessment of hyperparameter estimation 




(a) 7„ chain (b) 7^, chain 

Figure 6. Chains and histograms for j n , Fig. 6(a), and 7^, Fig. 6(b), for the Galactic Cirrus. The 
chains show the burn-in period (about 1000 iterations) and the steady state. The corresponding 
histograms are computed on steady state only. 

This section assesses the unsupervised capabilities through evaluating the hyperparameter es- 
timation using the Galactic Cirrus and a realization of the prior for the map (which makes a true 
value 7* available). Fig. 6 shows the chains and the histograms that approximate the marginal 
posterior densities p(j n \y) and p(~y x \y)- In both cases, the histogram is relatively narrow al- 
though the prior is a wide non-informative Jeffreys' distribution (see Eq. (10)). In other words, 
the observations are sufficiently informative to quantify noise and regularity level and the method 
is able to capture this information. 

From a quantitative standpoint, results are given in Tab. 4. For the Galactic Cirrus and prior 
realization, the estimated values j n are very similar to the true value 7^ (error is less than 1 %). 
Moreover, the PSD are very low (0.40 %). In the case of prior realization, the estimated value % 
is in the correct range but the error is larger (about 17 %) and the PSD is 1.7 %. This difference 
can be naturally explained by two elements: (i) the noise is added at the system output, so it is 
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directly observed, whereas the map is at the system input, i.e. indirectly observed; (ii) the added 
noise is a realization of the prior density for the noise while the Galactic Cirrus is not a realization 
of the prior density for the map. 





7n 


7n 


PSD 




% 


PSD 


7^ est 




Cirrus 


10 6 


1.009 x 10 6 


4.07 x 10 3 




8.99 x 10" 


2.46 x 10 10 


2.47 x 10 12 


1 


Prior 


10 6 


1.003 x 10 6 


4.05 x 10 3 


4 x 10 11 


3.28 x 10 11 


1.07 x 10 10 


8.37 x 10 11 





Table 4. Hyperparameter estimation: true values, estimates, PSD and best values. 



5.3.3. Real observation processing 

This section proposes a first assessment for a real observation. It is based on the reflection nebula 
NGC 7023 acquired during the science demonstration phase of Herschel, which as been presented 
in (Abergel et al. 2010) and was processed in our previous paper (Orieux et al. 2012b). There 
and here, computations are made on the level- 1 files processed using HIPE. In our previous paper 
(Orieux et al. 2012b), the offsets were removed in a pre-processing step and the regularization 
parameter was tuned by hand compromise between gain in resolution and overamplification of 
the observations. In contrast, here both are automatically tuned. 

Fig. 7 presents the evolution of the chains for the hyperparameters: Fig. 7(a) for the noise 
parameter j n and Fig. 7(b) for the image parameter j x . It is important to notice that the algorithm 
behaves in a very similar manner for the real observation and for the simulated observation 
(see Fig. 7 compared to Fig. 6). The figures also give an empirical indication of the algorithm 
operation: after a burn-in time (empirically less than about 500 iterations) the stationary state 
is attained and the chain remains in a steady state: the samples are drawn under the posterior 
density. Concerning the offsets, the chains begin in the steady state, thanks to a good initialization 
based on (Orieux et al. 2012b) results. All bolometer offsets behave in the same manner with two 
example illustrated Figs. 7(d) and 7(e). The empircal mean of the offsets sample start to stabilize 
at approximately 500 samples. 

Fig. 7(c) shows the corresponding reconstructed map. Its quality is equivalent to the quality 
of the map restored by empirically tuning the hyperparameter presented in (Orieux et al. 2012b), 
Fig. 8. In other words, the proposed unsupervised method automatically determines hyperparam- 
eters (noise power and offsets as well as sky power) that produce a map almost as good as the 
map produced by a hand-made hypermarameter tuning. In addition, the map remains far better 
than the naive map shown in Fig. 7(f). 

5.4. Myopic and unsupervised approach 

The myopic and unsupervised question is a threefold problem that is much more ambitious: 
estimate the instrument parameter, the hyperparameters, and the map itself from a unique ob- 
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(c) Reconstructed map 




(d) Offset chain for the first (e) Offset chain for the second (f) Naive map (coaddition) 



bolometer 
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Figure 7. Results for real observation processing (reflection nebula NGC 7023). Chains for the 
noise parameter (j n ) and for the image parameter (7^) in Figs. 7(a) and 7(b). The stationary state 
is attained after a burn-in time of about 500 iterations. Fig. 7(c) shows the corresponding map. 
Figs. 7(d) and 7(e) illustrate chains and marginal histogram for two bolometer offsets. 



servation. In addition, the instrument parameter intervenes in a complex way in the description 
of the observations, and moreover, the problem is stated in a context that is doubly delicate: 
ill-posedness and high-resolution. 

In (Orieux et al. 2012b), the equivalent PSF has a Gaussian shape whose standard deviation is 
proportional to the wavelength: <r (A) = ?/A. It is then integrated w.r.t. the wavelength (to include 
the spectral extend) and w.r.t. the time parameter (to account for the bolometer response) to form 
the global instrument response. To test the method, we consider the instrument parameter 77 to be 
poorly known and introduce elements of the feasibility to estimate it. 

The prior is the Gaussian density given by Eq. (11), with K = 1. Its mean is taken from 
the SPIRE observer manual /i = 2.96 x 10 4 ["/m] and its standard deviation is set to p = 10 4 , 
i.e. a relatively large uncertainty. It is about 33% of the mean and an equivalent prior interval 
is [0.96 x 10 4 ,4.96 x 10 4 ] in a two-standard-deviations sense. Two cases are investigated for 
the true value (used to simulated observations) : — 2.46 x 10 4 and = 3.46 x 10 4 . The 
conditional posterior for 7/ (section 4.3) does not have a standard form and its sampling (step (3) 
of Tab. 1) relies on a Metropolis-Hastings sampler, itself based on a random-walk with a Gaussian 
excursion. The size of the excursion was chosen so that the acceptation rate is around 50%. Two 
maps are used for the observation: the Galactic Cirrus and the Galactic Cirrus with point sources. 
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In each case, the algorithm was run several times from identical and different initializations, and 
shows similar qualitative and quantitative behaviours as those in Fig. 8. 

6* 10 * ■ ■ . 6" ">' ■ . 

Chain Cham 

True True 

5 Mean 5 '. • • 



MeahT-sigma Mean--sigma 




500 1000 1500 2000 500 1000 1500 2000 



(a) Case 1, parameter (b) Case 2, parameter 
chain chain 

Figure 8. Instrument parameter chain (myopic and unsupervised approach) for the of Galactic 
Cirrus with point sources. Left (right) part of the figure deals with case 1, i.e. 77* = 2.46 x 10 4 
(case 2, i.e. 77*. = 3.46 x 10 4 ). The horizontal axis gives the iteration index and the vertical range 
is the prior interval in a two-standard-deviations sense. The true value is shown by the straight 
line. 



Case 


* 

V 


V 


77-77* 




a 


1 


2.46 x 


10 4 


2.29 x 


10 4 


-1.65 x 10 3 


6.7% 


2.2 x 10 2 


2 


3.46 x 


10 4 


3.27 x 


10 4 


-1.86 x 10 3 


5.4% 


2.9 x 10 2 



Table 5. Quantitative evaluation of the estimation of the instrument parameter using the Galactic 
Cirrus with point sources. Prior mean and standard deviation are /j = 2.96 x 10 4 and p = 10 4 . 



Nevertheless, as expected, the spectral content of the Galactic Cirrus is not sufficiently ex- 
tended towards high frequencies to provide an excitation that is adequate for instrument identifi- 
cation. In contrast, the Galactic Cirrus with point sources is more extended and estimations are 
more accurate. Tab. 5 presents quantitative assessments. The main result is that the estimation 
error is about 6%. It is a remarkable result given the difficulty of the problem (triple problem, 
complex relations, ill-possedness, and high resolution) and given that the prior uncertainty is 
about 33%. In other words, the method is able to capture information about instrument param- 
eter, jointly with noise level, regularity level, and map from a unique observation. However, the 
parameter 77 seems to be slightly underestimated which, we explain as follows. The input map 
(with point sources) presents a broad spectral extent whereas the prior favours spatially extended 
maps (dominated by relatively low frequencies), so the posterior advocates a narrower PSF to 
compensate for this spectral discrepancy. 

Figs. 9(a) and 9(b) show the related maps. They must be compared to the map restored with 
the true instrument parameter and the best hyperparameter presented in Fig. 7(b) of (Orieux et al. 
2012b) and in Fig. 9(d) here. They must also be compared to the true map and the naive map also 
given in (Orieux et al. 2012b) and in Figs. 9(c)-9(e) here. As previously, the proposed maps show 
a fine grainy texture but despite this defect, they remain similar to the true map. The quality of 
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(a) Proposed case 1 (b) Proposed case 2 




(c) True map (d) Best (e) Naive map 

Figure 9. Restoration of cirrus superimposed on point sources. The proposed maps must be com- 
pared to the maps restored with the true instrument parameter and the best hyperparameter and 
with the naive map. 

the proposed maps is similar to the quality of the map restored with the true instrument parameter 
and the best hyperparameter. In addition, several point sources of the true map are visible on the 
proposed maps but not on the naive map. In other words, the proposed method automatically 
determines instrument parameter and hyperparameters that produce a map almost as good as the 
best one and better than the naive map. 



6. Conclusion 

We described regularized methods for image reconstruction and focused on parameter estima- 
tion: 

- hyperparameters, which guide the trade-off between prior-based and observation-based in- 
formation, 

- instrument parameter, which tunes the physical characteristics of the model of the acquisition 
system. 

They were jointly estimated with the map of interest. We were therefore dealing with an unsu- 
pervised and myopic inverse problem. 

The most delicate point is jointly handles the different types of variables and their interactions 
in direct terms but, above all, in inverse terms. From a methodology point of view, we worked 
in the framework of hierarchical full Bayes strategies that model the available information for 
each set of variables (map, hyperparameters, instrument parameter, and observations) under a 
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probability density. We defined the posterior density, which gathers the information on the map 
of interest and the parameters, given the observations. We then defined the posterior mean as 
an estimate of the map and the posterior standard deviation as a measure of uncertainty, which 
gives an uncertainty map. This approach makes it possible to work in a global and consistent 
framework to solve the problem as a whole. It draws its inspiration from our earlier works on 
deconvolution (Orieux et al. 2010b) and adapts them to the case at hand. 

The posterior density was explored by stochastic sampling using a Gibbs algorithm. The 
sampling of the map was difficult: we are dealing with a large-sized multivariate normal den- 
sity for which classical techniques do not apply. We overcame this difficulty by constructing a 
sample as the minimizer of a well-chosen perturbated criterion (Orieux et al. 2012a). Another 
problematic point is the instrument parameter sampling: we are dealing with a very complex, 
nonstandard density. This difficulty was overcome by means of a Metropolis-Hastings step. The 
estimate of the map as well as the parameters (posterior mean) and the uncertainties (posterior 
standard deviation) were calculated numerically as empirical averages based on the simulated 
samples. 

We presented a first application of the developments (Bayesian estimation method and 
stochastic sampling algorithm) in a real context: the SPIRE instrument of the Herschel space 
observatory. The study was essentially performed on simulated observations and has also yielded 
some initial results on real observations. We concluded that the approach is applicable and en- 
ables joint estimation of the map, the hyperparameters, and the instrument parameter from a 
unique observation. We showed, among other results, that the quality of the proposed map is 
similar to that obtained when the instrument parameter is known and the hyperparameters are 
fixed by hand in a supervised way (using the sky truth). The method shows remarkable results 
given the difficulty of the problem. It seems to us that these initial results are particularly promis- 
ing and worth developing. They may open up many new perspectives for imaging in astrophysics 
in a myopic and unsupervised framework. 

Appendix A: Gamma probability density 

The gamma pdf for 7 > 0, with given parameters a > and b > 0, is written 

g(7K h) = b^ia) 1 ^ 1 CXP ( ~ 7/6) • (A ' 1} 

The following properties hold: mean is Eg [7] = ab, variance is Yg [7] = ab 2 and maximizer is 
b(a — 1) if and only if a > 1. 
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